#Usage
This lab is the small language experiment for MRCJ (Multi-round Conversational Jailbreaking). To run this lab, you will need to specify several command-line arguments to set up your model configurations. 

Before running the experiment, you need to replace the openai_api_key and opensourced_api_key in both the train.py and test.py files with valid API keys.
If you are not using locally deployed proprietary models, you will also need to update the base_url_mapping as shown below:

base_url_mapping = {
    # Add your URL mappings here
}

Ensure that the correct URLs corresponding to the models you are using are included in the mapping.


Here's how you can run the script from the terminal:
##TRAIN
python /train/train.py --target_model [MODEL] --judge_model [JUDGE_MODEL] --cat_questions_num [NUMBER] --harmful_score_threshold [SCORE] --train_from_scratch --dir [DIRECTORY]

###Arguments:
--target_model: Specify the target model from the available options (mistral, chatglm, llama, qwen, vicuna-7b, vicuna-13b). This argument is required.
--judge_model: Choose the judge model from the options (gpt-4o, gpt-4o-mini). The default is gpt-4o.
--cat_questions_num: Sets the number of categorical questions. The default is 2.
--harmful_score_threshold: Sets the threshold for harmful scores, which is used to filter outputs based on defined criteria. The default value is 3.
--train_from_scratch: Add this flag to specify if the model training should start from scratch. If omitted, the script will load from an existing checkpoint.
--dir: Specify the directory under which to save data. This argument is required.

###Example Command
python /train/train.py --target_model llama --judge_model gpt-4o-mini --cat_questions_num 5 --harmful_score_threshold 4 --train_from_scratch --dir my_model_data

This example sets up the llama model, uses the gpt-4o-mini as the judge model, asks 5 categorical questions, sets a harmful score threshold of 4, trains from scratch, and saves data in /train/tmp/my_model_data/.


##TEST
python /test/test.py --target_model [MODEL] --judge_model [JUDGE_MODEL] 

###Arguments:
--target_model: Specify the target model from the available options (mistral, chatglm, llama, qwen, vicuna-7b, vicuna-13b). This argument is required.
--judge_model: Choose the judge model from the options (gpt-4o, gpt-4o-mini). The default is gpt-4o.

###Example Command
python /test/test.py --target_model llama --judge_model gpt-4o-mini 

This example sets up the llama model, uses the gpt-4o-mini as the judge model, use the coversations under dir in the configfile ‘/train/tmp/config.txt’.

